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w , Abstract 

h- j ■ 

jyj ' This paper presents a technique for the optimization of bound queries over disjunctive 

^ , deductive databases with constraints. The proposed approach is an extension of the well- 

' '■ known Magic-Set technique and is well-suited for being integrated in current bottom- up 

, I (stable) model inference engines. More specifically, it is based on the exploitation of binding 

K^ ■ propagation techniques which reduce the size of the data relevant to answer the query and, 

f-f\ ' consequently, reduces both the complexity of computing a single model and the number 

^_^ , of models to be considered. The motivation of this work stems from the observation that 

f^ ■ traditional binding propagation optimization techniques for bottom-up model generator 

\^ ' systems, simulating the goal driven evaluation of top-down engines, are only suitable 

("^ , for positive (disjunctive) queries, while hard problems are expressed using unstratified 

\l ' negation. 

^^ I The main contribution of the paper consists in the extension of a previous technique, 

C/J . defined for positive disjunctive queries, to queries containing both disjunctive heads and 

O ' constraints (a simple and expressive form of unstratified negation). As the usual way of 

^ , expressing declaratively hard problems is based on the guess-and-check technique, where 

• rH , the guess part is expressed by means of disjunctive rules and the check part is expressed by 

^\ ' means of constraints, the technique proposed here is highly relevant for the optimization 

, of queries expressing hard problems. The value of the technique has been proved by several 

' experiments. 



1 Introduction 

Disjunctive Datalog programs, i.e. programs allowing clauses to have both dis- 
junction in their heads and negation in their bodies (short. Datalog^'^ programs), 
have been successfully introduced with the aim of modelling incomplete data 
HLobo et aJ., 1992| ). Over the last few years, there has been a great deal of in- 
terest in studying declarative semantics for Datalog^'^ programs. In fact, mini- 
mal model semantics ( [Grant fc Minker, 1986| ) is widely accepted for programs in 

* A preliminary version of this paper was presented at the LPAR'02 Conference (Greco et al. 
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the absence of body negation (Datalog"^ programs), and is naturally extended for 
programs with (possibly unstratified) negation. For Datalog^'^ programs there 
are several extensions of the minimal model semantics such as disjunctive sta- 
ble models (IPrzymusinski, 19911 [Gelfond fc Lifschitz, 1991) ) minimal founded se- 
mantics ( |Furfaro et aJ., 2004, ), and different proposals of well-founded semantics 
(see, e.g., ( |Ross, 1990| [Brass fc Dix, 1999| [PT^ymusinski, iMsllEiter et al, 1997b| 
[Baral et al, 1992|rWa^g, 2000| )). 

For each of the above semantics, several algorithms and techniques for model 
generation have also been proposed. Model generation is often carried out through 
bottom-up evaluation of clauses UMinker fc Rajasekar, 1990| [Brass fc Dix, 1994| 
[Fernandez fc Minker, 1995al [Leorie et al, 2002| [Simons et al, 2002| ). 

For instance, bottom-up methods have been employed to compute perfect and 
stable models ( [Fernandez fc Minker, 1995a{ [Fernandez fc Minker, 1995b[ ) using or- 
dered model trees, and to process logic programs under the minimal model seman- 
tics using a tableau calculus ( [Niemela, 1996[ ). 

This paper focuses on stable model semantics, according to which a dis- 
junctive program may have several alternative models, called answer sets, each 
one corresponding to a possible view of the world — see ( [Niemela, 2003[ ) for 
a recent overview of answer set programming. Disjunctive logic programs un- 
der stable model semantics are very expressive, since they capture the complex- 
ity class T,2 ( [Eiter et al, 1997a| ) and they have been used for developing sev- 
eral practical applications. Furthermore, in the last few years several efficient in- 
ference engines implementing stable model semantics have been developed; the 
DLV system ( [Leone et al, 2002| ) and the GnT system ( [Janhunen et aJ., 2000[ ) 
should be recalled which implement disjunctive stable models, while for the non- 
disjunctive case many other engines are currently available (see, e.g., Smod- 
els fSyrjanen fc Niemela, 2'OOTl ), DeReS ( [Cholewinski et al, 199'6| ), and ASS AT 
JLin fc Zhao, 20021 )). 

Even though model generator techniques for stable model semantics are quite 
useful for knowledge representation and reasoning tasks, it is well-known that they 
are inefficient when used for refutations, i.e. query answering. In fact, they often 
explore a search space much larger than that required, and tend to generate answers 
to all the possible queries rather than to the precise query. 

However, it is often the case that only a strict subset of the stable models needs 
to be considered and that there is no need to compute these models in their entirety 
to answer the query. This intuition is exploited by top-down techniques which only 
consider atoms necessary to answer the current query and outperform model gen- 
erators used for refutation. In fact, top-down approaches systematically utilize the 
query to propagate the binding into the body of the rules to avoid computing all 
the models of the program — a brief overview of top-down (disjunctive) reasoning 
is presented in the next section. 

In order to optimize query evaluation, while still preserving the ability to com- 
pute models (well-suited for complex reasoning tasks), several works proposed the 
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simulation of top-down strategies, by means of suitable transformations introducing 
new predicates and rewriting some clauses. 

Among them, the Magic-Set technique is the best-known technique. This is due to 
its efficiency and its generality, even though other focused methods, such as the sup- 
plementary Magic-Set and other special techniques for linear and chain queries have 
also been proposed (see, e.g., ( |Beeri fc Ramakrisnhan, 1991| |Greco et al, 1995) 
lUllman, 1989al [Ramakrisnhan et aJ., 1993| )). 

It should be recalled that the formal equivalence between top-down evaluation 
with memoing and bottom- up evaluation with Magic-Set optimization is well-known 
( tUUman, 1989a_ , _Ullman, 1989bJ . However, this Magic-Set optimization technique 
is only suitable for positive Datalog queries, i.e. queries without disjunction and 
negation. To the best of our knowledge the first extension of the Magic-Set technique 
for the evaluation of disjunctive Datalog^ queries, in which negation is not allowed, 
was introduced in ( [Greco, 1999| ) and extended in ( [Greco, 2003| ). 

This paper further investigates optimization techniques simulating the top-down 
evaluation of queries, by extending the proposal presented in ( [Greco, 2003| ) from 
Datalog^ programs to Datalog^ '^ programs. Actually, a syntactic restriction of 
Datalog^'"' programs is considered in which unstratified negation can only be ex- 
pressed in the form of constraint rules. Notice that this is not truly a restriction. 
In fact, constraints represent a natural way to extend database semantics, by ex- 
plicitly defining properties which are supposed to be satisfied by all instances over 
a given database schema. Moreover, the usual way of expressing declaratively hard 
problems, such as NP problems and problems in the second level of the polynomial 
hierarchy (E^ and 11^ problems), is based on the guess-and-check technique where 
the guess part is expressed by means of disjunctive rules and the check part is 
expressed by means of constraints. Therefore, the technique proposed here is highly 
relevant for the optimization of queries expressing hard problems. 

Even though constraints can be easily managed in top-down approaches, it should 
be pointed out they represent a serious issue when simulating top-down reasoning 
by means of the Magic-Set technique. In fact, all the rewriting techniques presented 
so far in the literature cannot be straightforwardly extended to constraint rules. 



1 . 1 Contributions 

The main contributions of the paper are as follows. 

• A query rewriting algorithm is defined which allows the simulation of top- 
down computation in bottom-up inference engines by propagating the bind- 
ings from the query-goal into the body of rules. The rewriting technique avoids 
the computation of complete models and useless models which are not necess- 
ary to answer the query. Essentially the technique is an adaptation of the 
Magic-Set technique ( [Bancilhon et al., 1986| [Beeri &: Ramakrisnhan, 1991| ) 
to disjunctive Datalog programs with constraints. 

• It is observed that the application of the Magic-Set technique to queries with 
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constraints produces a query that can be evaluated more efficiently, but, unfor- 
tunately, the technique may produce a query that, generally, is not equivalent 
to the original one. The conditions that a program must satisfy in order to 
preserve this equivalence are investigated. 

• The proposed algorithm is an extension of HGreco, 2003| ) for Datalog^ pro- 
grams only. However, an in-depth analysis is also provided of the algorithm in 
( |Greco, 2003| ), formally showing how it can work independently of the partic- 
ular strategy adopted for simulating the binding occurring in top-down eval- 
uation. In this way, this approach is orthogonal to the Magic-Set technique, 
since it can exploit any other rewriting strategy proposed in the literature. 

• In order to verify the validity of this approach the technique has 
been tested with different disjunctive programs using the DLV system 
( |Leone et ah, 2002| ). The experiments comparing the execution time required 
to answer the source and the optimized query, have achieved very encouraging 
results proving the value of the approach. 

Even though there are a number of proposals for efficient query answering in dis- 
junctive deductive databases under well-founded semantics, there are few effective 
techniques for top-down reasoning under stable model semantics. Thus, the paper 
also contributes to provide an effective, implemented technique for query answering 
under stable model semantics. 

Finally, it should be stressed that even though the results in | |Ullman, 1989aj 
[Ullman, 1989bj suggest that this extension of the Magic-Set technique is equival- 
ent to the form of binding propagation occurring in the top-down evaluation, the 
technique may suffer from some potential inefficiency due to the computation of 
additional predicates and rules needed in the rewriting (intrinsic in the Magic-Set 
technique). However, note that some redundancy can also occur in the top-down 
evaluation due to the tabling of intermediate results. 

However, the main aim here is to formally prove the possibility of efficient query 
evaluation in bottom-up systems, breaking down the complexity of a brute force 
approach and still using the internal model generator. Thus, implementations of 
top-down systems supporting disjunction and stable model semantics have not been 
sought in order to make an experimental comparison w.r.t. this approach. This 
aspect calls for further study and research. 



1 . 2 Related Work on the Evaluation of Disjunctive Programs 

The (efficient) evaluation of disjunctive logic programs has a long his- 
tory dGrant fc Minker, 1986| |Yahya fc Henschen, 198'5i |Liu fc Sunderraman, 1990| 
|Yuan fc Chiang, 1989| ). Recently there has been a growing interest in answering 
queries on disjunctive deductive databases. The two main approaches proposed in 
the literature for the evaluation of queries and programs are now discussed. 



Optimization of Bound Disjunctive Queries with Constraints 5 

1.2.1 Bottom- Up Techniques 

The definition of efficient bottom-up evaluation algorithms for assigning semantics 
to disjunctive deductive databases has been the subject of several proposals. In the 
following some of these approaches proposed in the literature are briefly described. 
( |Minker fc Rajasekar, 1990| ) introduce the concept of state, consisting of a set 
of positive disjunctions, as the domain of a fixpoint operator that gives semantics 
to disjunctive logic programs. The fixpoint computation operates bottom-up and 
produces, as the resulting fixpoint, the model state, i.e. a state whose minimal 
models satisfy the disjunctive deductive database. 

( [Brass fc Dix, 199^ propose a general approach for defining the semantics of 
disjunctive logic programs. The framework consists of: a semantical part, where 
the declarative meaning of a program is defined in an abstract way as the weakest 
semantics satisfying certain properties, and a procedural part, namely a bottom- up 
query evaluation method based on operators working on conditional facts. More 
specifically, the approach is based on the generation of a residual program, i.e. a 
program obtained by transforming the original one, which makes the use of dis- 
junctive information explicit. 

( [Fernandez fc Minker, 1995a| ) introduce a new fixpoint characterization of the 
minimal models of disjunctive logic programs. The proposed operator, applied iter- 
atively, is shown to characterize the perfect model semantics of stratified disjunctive 
logic programs. Based on these results the authors present a bottom-up evaluation 
algorithm for stratified disjunctive deductive databases that uses the model-tree 
data structure to both represent the information contained in the database and 
compute answers to queries. 

( [Leone et ah, 2002) ) propose the DLV system, which exploits an algorithm for 
computing stable models for disjunctive logic programs. This approach searches 
for stable models by using efficient fixpoint algorithms computing the semantics of 
programs. In particular, it obtains effective performances by using an (intelligent) 
ground instantiation of programs, i.e. a program in which unsatisfiable ground rules 
are deleted, and heuristics are used for implementing a backtracking search strategy 
for pruning the search space. 

( [Simons et aJ., 2002| ) define a novel answer set programming language that gener- 
alizes normal logic programs. The language allows weighted constraint rules which 
increase the expressivity of the language to express different types of constraint (e.g. 
cardinality conditions) optimization capabilities. The declarative semantics extends 
the one for normal programs while the complexity of computing stable models for 
this novel language is comparable to that of normal programs (without considering 
optimizations). 

1.2.2 Top- Down Techniques 

Top-down query evaluation is based on refutation procedures. The first of such refu- 
tation techniques, called SLD-resolution, was introduced in ( [Kowalski, 1974| ) and 
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is suitable for Datafog programs only, i.e. programs without negation and disjunc- 
tion. ( |Clark, 1978| ) extended SLD-resolution to SLDNF-rcsolution, introducing the 
negation as a failure rule for inforring negative information. 

In order to prevent the possibility of infinite loops and a large amount of re- 
dundant sub-derivations (intrinsic in SLD-resolution), the SLG-resolution was in- 
troduced ( |Chen fc Warren, 1993||r)hen fc Warren, 1996| ). It is a tabling mechanism 
for the evaluation of the well founded semantics of logic programs (without disjunc- 
tion), and it is the evaluation strategy underlying XSB, the best known state-of- 
the-art top-down tabling system ( |Sagonas et al, 1994D . ( |Shen et al, 2002| ) present 
an optimization of SLG-resolution, called SLT-resolution. 

The problem of top-down computation for disjunctive well-founded semantics 
(DWFS) was investigated in ( |Wang, 2001| ) . Specifically, the author proposes a top- 
down procedure for disjunctive well-founded semantics called D-SLS-resolution, 
which can eventually be optimized by employing some techniques such as the tabling 
method of ( |Chen fc Warren, 1996| ). A top-down method for testing DWFS mem- 
bership, based on the characterization of the DWFS in terms of Gelfond-Lifschitz 
transformations, is presented in pphnson, 2001| ). 

A bottom-up procedure computing queries in a top-down fashion has also been 
proposed for minimal model semantics | |Yahya, 2000| [Yahya, 2002| ) . The approach, 
suitable for positive queries, is based on the duality principle for interpreting logical 
connectives. In more detail, the duality transformation is obtained by reversing the 
direction of the implication arrows in the clauses representing both the program and 
the negation of the query goal. The application of a generic bottom-up procedure 
to the transformed clause set results in a top-down query answering. 

Finally, ( [Johnson, 1999| ) shows that disjunctive stable models can be character- 
ized in terms of cyclic covers, and in particular that such covers provide a powerful 
technique for characterizing the properties of query processing, compilation and 
view updating. A (correct and terminating) top-down method for query processing 
under the disjunctive stable model semantics is also presented. Supported covers 
have also been used to facilitate the top-down query processing under the possible 
model semantics ( [Johnson, 2002| |. 

1 . 3 Plan of the Paper 

The paper is organized as follows: Section El presents preliminary definitions and 
results on disjunctive Datalog, minimal and stable model semantics and Magic- 
Set rewriting; Section 3 presents the Magic-Set rewriting for positive disjunctive 
queries; Section 4 extends the binding propagation technique to disjunctive queries 
with constraints; Section 5 presents experimental results showing the validity of the 
proposed technique; finally. Section 6 presents the conclusions. 
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2 Preliminaries 

In this section standard concepts on Datalog, Magic-set rewriting, query equival- 
ence, disjunction and constraints are reviewed. 

2.1 Disjunctive Datalog 

The existence of alphabets of constants, variables and predicate symbols are 
aaumed. A term is a constant or a variable. An atom is of the form p{ti, ■ ■ •, tk) 
where p is a fc-ary predicate symbol and ti, • • •, t^ are terms. A literal is an atom A 
or its negation ^A. A Datalog^'" (or simply disjunctive Datalog) rule r is a clause 
of the form: 

fli V • • • V a„ ^ &i, • • • , fefc, -iCi, • • • , -iCn 

where n,k,m > 0, n + k + m > and ai , • • • , a™ , 6i , • • • , 6fc , ci , • • • , c„ are function- 
free atoms. The disjunction ai V • • • V «„ is called the head of r and is denoted 
by Head{r) while the conjunction bi, ■ ■ ■ , bk, ~'Ci, • • • , -ic„ is called the body and is 
denoted by Body{r). If m — 1, then r is normal (i.e. V-free) or Datalog^; if n == 0, 
then r is positive (i.e. -i-free) or Datalog^; if both m ~ 1 and n — 0, then r is 
normal and positive or Datalog; ii k — n ^ r is a. fact, whereas if m = r is 
a constraint or denial rule, i.e. a rule which is satisfied only if Body{r) is false. A 
Datalog^ '^ program P is a finite set of Datalog^'^ rules; it is normal (resp. positive) 
if all its rules are normal (resp. positive). Given a program "P and a predicate symbol 
g occurring in V, the definition of g, denoted by def{g,V), consists of all rules in 
V having g in their heads. 

The Herbrand Universe U-p of a program V is the set of all constants appearing 
in V, and its Herbrand Base B-p is the set of all ground atoms constructed from 
the predicates appearing in V and the constants from U-p. A ground term (resp. an 
atom, a literal, a rule or a program) is a term (resp. an atom, a literal, a rule or a 
program) where no variables occur in it. A rule r' is a ground instance of a rule r, 
if r' is obtained from r by replacing every variable in r with some constant in Up. 
We denote by ground{V) the set of all ground instances of the rules in V. 

Given a set of ground atoms X, a program V, a predicate symbol g and an atom 
g{t), I[g] denotes the set of g-aiovas in X, whereas I\P] denotes the set of atoms 
in I whose predicate symbol appears in the head of some rule of V. Given a set of 
interpretations 5, then S[g] = {M[g]\M e S] and S[V] = {M[V]\M G S}. 

An interpretation of V is any subset of Bp. The value of a ground atom L w.r.t. 
an interpretation /, valuei{L), is 1 {true) \i L ^ I and (false) otherwise. The value 
of a ground negated literal not L is 1 — valuei{L). The truth value of a conjunction 
of ground literals C ~ Li, . . . , L„ is the minimum over the values of the Li, i.e. 
valuei{C) ~ min{{valuei (Li) | 1 < « < n}), while the value of a disjunction 
D = Li V • • • V L„ is its maximum, i.e. valuei{D) = max{{valuei{Li) | 1 < i < n}); 
if n = 0, then valuei{C) = true and valuei{D) = false. A ground rule r is satisfied 
by / if valuej{Head{r)) > valuei{Body{r)). Thus, a rule r with empty body is 
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satisfied by / if valuei{Head{r)) = true. An interpretation M for 7-" is a model of 
V ii M satisfies all rules in groundiV) . 

The model-theoretic semantics for a positive program V assigns the set of its min- 
imal models MM{V). A model M for V is minimal, if no proper subset of M is a 
model for V ( |Minker, 1994| ). Accordingly, the program ■p = {aV6<— } has the two 
minimal models {a} and {6}, i.e. M.M{P) = { {a}, {6} }. The more general dis- 
junctive stable model semantics generalizes stable model semantics, previously de- 
fined for normal programs (Gclfond & Lifschitz, 1988) and also applies to programs 
with (unstratified) negation ( ,Gelfond fc Lifschitz, 1991||Przymusinski, 1991| ). 

Let P be a logic program V and let / be an interpretation for V, V^ denotes 
the ground positive program derived from ground{V) by (1) removing all rules 
that contain a negative literal -la in the body and a (£ I, and (2) removing all 
negative literals from the remaining rules. An interpretation M is a (disjunctive) 
stable model for V if and only if M e MM{V^). 

For general V, the stable model semantics assigns to V the set SAA{V) of 
its stable models. It is well known that stable models are minimal models (i.e. 
SM{'P) C AiAi[V)) and that for negation- free programs minimal and stable model 
semantics coincide (i.e. SM{P) — MM{P)) and that Datalog programs have a 
unique minimal model. 

Predicate symbols can be either extensional (i.e. defined by the ground facts of a 
database — EDB predicate symbols), also called base predicates, or intensional (i.e. 
defined by the rules of the program — IDB predicate symbols) , also called derived 
predicates. Thus a database D consists of a set of ground facts having in the head 
a base predicate symbol (i.e. ground normal rules with empty body defining base 
predicates), whereas a program V consists of a set of (disjunctive) rules having in 
the heads derived predicate symbols. 

A disjunctive Datalog query over a database defines a mapping from the database 
to a finite (possibly empty) set of finite (possibly empty) relations for the goal. A 
query is a pair {G, V) where G is an atom, called a goal, and "P is a program. The 
application of a query Q to a database D is denoted by Q{D) and the union of the 
program V and the facts in D is denoted by Vd ■ Clearly, all models for Vd contain 
the database D. 

The result of a query Q — [G,V)on an input database D is defined in terms of 
the stable models of Vd , by taking either the union {possible inference) or the inter- 
section {certain inference) of all models. Thus, given a program P and a database 
D, a ground atom G is true, under possible (brave) semantics, if there exists a 
stable model M for Vd such that G ^ M . Analogously, G is true, under certain 
(cautious) semantics, if G is true in every stable model for Vd- 

Given an atom G and an interpretation M , A{G, M) denotes the set of substi- 
tutions for the variables in G such that G is true in M. The answer to a query 
Q = {G,V) over a database D under possible (resp. certain) semantics, denoted 
AnSp{Q,D) (resp., AnSc{Q,D)) is the relation [JmA{G,M) (resp., r\MA{G,M)) 
such that M e SM{V,D). Two queries Qi = {Gi,Vi) and Q2 = (^2,7^2) are 
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said to be equivalent under semantics s (Qi =s Q2) if for every database D (on 
a fixed schema) AnSs{Qi, D) — AnSs{Q2j D). Moreover, we say that two programs 
Vi and 7^2 are equivalent under a given semantics s: Vi =s 'P2 if for every atom g 
{g, Vi) =s {g, V2)- Finally, if Qi =p Q2 and Qi =c Q2 (the two queries or programs 
are equivalent under both brave and cautious semantics) we simply write Qi = Q2- 



2.2 Magic-Set rewriting 

In the literature different approaches have been proposed for the efhcient bottom- 
up evaluation of queries, e.g. the Magic-Set ( |Bancilhon et al, 1986[ ), the supple- 
mentary Magic-Set ( |Beeri &: Ramakrisnhan, 1991| ) and other specialized rewrit- 
ing techniques ( |Greco et ai., 1995||Ullman, 1989al|Ramakrisnhan et al., 1993| ). The 
key idea of all these techniques consists in the rewriting of deductive rules with re- 
spect to the query goal to answer the query without actually computing irrelevant 
facts. In this section we recall the Magic-Set rewriting is recalled, which is a gen- 
eral and well-known technique for the optimization of Datalog queries. Although 
the Magic-Set technique can be applied to general Datalog queries, for the sake 
of simplicity, here the technique for linear programs is presented, i.e. programs 
whose rules contain, at most, one body predicate mutually recursive with the head 
predicate. 

The Magic-Set rewriting consists of three separate steps: 

1. An Adornment step in which the relationship between a bound argument in 
the rule head and the bindings in the rule body is made explicit. 

2. A Generation step in which the adorned program is used to generate the 
magic rules which simulate the top-down evaluation scheme. 

3. A Modification step in which the adorned rules are modified by the magic 
rules generated in Step (2); these rules will be called modified rules. 

An adorned program, V^ is a program whose predicate symbols have associated a 
string a, defined on the alphabet {b,f}, of length equal to the arity of the predicate. 
A character b (resp. /) in the i-th position of the adornment associated with a 
predicate p means that the i-th argument of p is bound (resp. free). 

The adornment step consists in generating a new program whose predicates are 
adorned. Given a rule r and an adornment a of the rule head, the adorned version 
of r is derived as follows: 

1. identify the distinctive arguments of the rules as follows: an argument is dis- 
tinctive if it is bound in the adornment a, is a constant or appears in a base 
predicate of the rule-body which includes an adornment argument; 

2. assume that the distinctive arguments are bound and use this information in 
the adornment of the derived predicates in the rule body. 

Adornments containing only / symbols can be omitted. 

Given a query Q = (q{T),V) and letting a be the adornment associated with 



10 G. Greco, S. Greco, I. Trubitsyna and E. Zumpano 

q{T), the set of adorned rules for Q is generated by 1) first computing the adorned 
version of the rules defining q and 2) then generating, for each new adorned pred- 
icate p" introduced in the previous step, the adorned version of the rules defining 
p w.r.t. a; Step 2 is repeated until no new adorned predicate is generated. 

The second step of the process consists in using the adorned program for the 
generation of the magic rules. For each of the adorned predicates in the body of 
the adorned rule: 

1. eliminate all the derived predicates in the rule body which arc not mutually 
recursive with the rule head; 

2. replace the derived predicate symbol p" with magic^p" and eliminate the 
variables which are free w.r.t. a; 

3. Replace the head predicate symbol q" with magic_q" and eliminate the vari- 
ables which are free w.r.t. a; 

4. interchange the transformed head and derived predicate in the body. 

Finally, the modification step of an adorned rule is performed as follows: for each 
adorned rule whose head is p°'{X), where X is a list of variables, extend the rule 
body with magic-p°'{X') where X' is the list of variables in X which are bound 
w.r.t. a. 

The final program will contain only the rules which are needed to answer the 
query. 

Example 1 

Consider the query Q3 = (p(l, C), V3) where Vz is dehned as follows: 

p(X,C) ^ q(X,2,C)- 

q(X,Y,C) ^ a(X,Y,C)- 
q(X,Y,C) ^ b(X,Y,Z,W), q(Z,W,D), c(D,C)- 

The adorned version of P3 w.r.t. the query goal p(l,C) is: 

p"(X,C) ^ q'"(X,2,C). 

q'="(X,Y,C) ^ a(X,Y,C)- 

q''"(X,Y,C) ^ b(X,Y,Z,W), q'"'*(Z, W, D), c(D,C)- 

The rewritten query is Q3 = (p''*(l, C),7'3) where V^ is as follows: 

magic_p^*(l)- 

magic_q^"(X,2) ^ magic_p^*(X)- 

magic_q^''*(Z,W) ^ magic_q^*(X, Y),b(X, Y, Z, W)- 

p'>*(X,C) ^ magic.p^*(X), q^"(X,2,C)- 
q'»'*(X,Y,C) ^ magic_q''''*(X,Y), a(X,Y,C)- 
q''"(X,Y,C) ^ magic.q'''>*(X,Y),b(X,Y,Z,W),q^M(2^W,D),c(D,C)- 

Note that the first set of rules consists of the magic rules generated in the second 
step, while the second set of rules consists of the modified rules. □ 
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Observe that, although the technique presented here appHes only to negation free 
linear programs, the Magic-Set rewriting is general and can also be applied to non- 
linear programs with some form of negation (e.g. stratified negation) where bindings 
are also propagated through derived predicates ( |Beeri fc Ramakrisnhan, 1991| ). 

Let Q = {GjV) be a query, then Magic{Q) denotes the query derived from Q 
by applying the Magic-Set technique. The query Magic{Q) will also be denoted as 
{G°',magic{G, V)) where G" denotes the adorned version of G, and magic{G,'P) 
denotes the rewriting of V w.r.t. the goal G. The rewritten program consists of two 
distinct sets of rules: a set of new rules (generated in Step (2)), called magic rules, 
and the set of modiEed rules, (generated in Step (3)) which is derived from the set 
of rules in the source program. The adorned rules generated in Step (1) are denoted 
hy Adorn{G,V). 



3 Binding Propagation for Positive Queries 

In this section the technique proposed in ( |Greco, 1999| |Greco, 2003| ) for apply- 
ing the Magic-Set technique to positive disjunctive Datalog programs is reviewed 
and extended. The technique proposed in ( |Greco, 1999| |Greco, 2003| ) produces a 
rewriting of the source query which, under the bottom-up evaluation, simulates the 
propagation of bindings occurring in the query-goal, performed in the top-down 
evaluation. 

It should be pointed out that this formalization stems from the approach in 
( |Greco, 2003| ). However, it is further extended by providing a new result on query 
equivalence (not stated in ( |Greco, 2003| )), which is crucial for allowing the technique 
to work in the case of disjunctive programs with constraint rules. This result is 
important and states that the rewriting technique is independent of the particular 
strategy adopted for simulating the propagation of bindings carried out in top-down 
evaluation. Therefore, even though conceptually introduced as an extension of the 
Magic-Set technique, this approach is orthogonal to the Magic-Set technique, since 
it can use any other rewriting strategy proposed in the literature. 

For the sake of simplicity, the following running example is considered. 

Example 2 

Consider the query (cincestor( j ohn, Y), ANC) where the program ANC consists of the 

following rules: 

f ather(X, Y) V brother(X, Y) ^ related(X, Y)- 

ancestor(X, Y) ^ f ather(X, Y)- 

ancestor(X, Y) ^- father(X, Z), ancestor(Z, Y)- 

The predicate ancestor defines the transitive closure of father, while father is 
defined by a disjunctive rule. □ 
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Given a positive disjunctive Datalog program V, the first step is to construct a 
suitable normal program. 

Definition 1 

Let V he a (positive) disjunctive Datalog program. The extended standard version 
of V, denoted esv(V), is the Datalog program derived from V by replacing each 
disjunctive rule ai V • • ■ V a™ ^ ^ with 

1. m rules of the form Ui ^ B for 1 < i < m, and 

2. m X {m — 1) rules of the form a^ «— aj , B for 1 < i,j < m and i ^ j. 

Given a query Q= {G, P), we denote with esv{Q.} the query {G, esv{V)). □ 

Example 3 

The program esi'(ANC), where ANC is the program presented in Example El consists 
of the following rules: 

father(X,Y) ^ related(X, Y)- 

brother (X,Y) ^ related(X, Y)- 

father(X,Y) ^ brother(X, Y), related(X, Y)- 

brother(X,Y) ^ father(X,Y), related(X, Y)- 

ancestor(X, Y) ^ father(X,Y)- 

aiicestor(X, Y) ^- father(X, Z), ancestor (Z, Y)- 

D 

Observe that the rules introduced by applying Item 2 of Definition^are subsumed 
by those introduced by applying Item 1; indeed, the semantics of esv(V) is not 
affected by the insertion of these rules. However, these rules are necessary in order 
to allow the propagation of the bindings, as will be clear in the following. 

It should be pointed out that programs V and esv{V) are not equivalent; in fact, 
■p is a disjunctive Datalog program that is able to express all the queries in Ef", 
while esv{'P) is a positive Datalog program that is able to express a subset of the 
queries computable in polynomial time. 

The second step is to derive a program that must be equivalent to the original 
one. Let us first present some notation. 

Definition 2 

Given a (positive) disjunctive Datalog program V, ESV{V) denotes the program 
derived from esv(V) by replacing each derived predicate symbol g with a new 
predicate symbol G. Given a query Q = {g{t),V), ESV{Q) denotes the query 
{G{t),ESV{V)) where G is the new symbol used to replace g. □ 

Definition 3 

Let P be a (positive) disjunctive Datalog program. The restricted version of V, 
denoted by RV{'P), is the disjunctive Datalog program defined as follows: 

RV{V)^{ai\/---Va,n^Ai,---,A^,B \ aiV- • -Va™ ^ B e V} 
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where each Ai, for 1 < i < m, is the atom replacing Ui in the program ESV{V). The 
rewritten version of Pis Rew(V) = RV{P)UESV{P). Given a query Q = {g{t),P), 
Rew{Q) denotes the query {g{t), Rew{V)). □ 

Observe that in the above definition, the rewriting of the program V does not 
take into account the goal. This is not true for rewriting techniques propagating 
bindings. 

Example 4 

The program Rew(ANC), where ANC is the program defined in Example |21 consists 

of the following rules RV{MiC): 

f ather(X, Y) V brother(X, Y) ^ FATHER(X, Y), BRDTHER(X, Y), related(X, Y)- 

ancestor(X, Y) ^ ANCESTQR(X, Y), f ather(X, Y)- 

ancestor(X, Y) ^ ANCESTOR(X, Y), f ather(X, Z), ancestor(Z, Y)- 

plus the set of rules ESV{Am) : 

FATHER(X,Y) ^ related(X, Y)- 

BRDTHER(X,Y) ^ related(X, Y)- 

FATHER(X,Y) ^ BROTHER(X, Y), related(X, Y)- 

BROTHER(X,Y) ^ FATHER(X,Y), related(X, Y)- 

ANCESTOR(X, Y) ^ FATHER(X,Y)- 

ANCESTDR(X, Y) ^ FATHER(X, Z), ANCESTDR(Z, Y)- 

The rewritten query is (ancestor(john, Y),Rew(ANC)) □ 

The programs P and Rew{V) actually have the same semantics (w.r.t. the pred- 
icates in P). 

Proposition 1 

Let V be a (positive) disjunctive Datalog program, and Rew {V) he the rewritten 

version ofV. Then, for every atom g{t), {g{t),V) = {g{t), Rew{V)). 

Proof. In order to prove that for any g{t), {g[t),V) = {g{t),Rew{V)), under 
both possible and certain semantics, it will be shown that for any database D, 
an interpretation M of Rew{V)D is a stable model if, and only if, M\Po] is a 
stable model for Vd- Moreover, only minimal models can be considered, since for 
disjunctive positive Datalog programs, the set of stable models of a given program 
coincides with the set of minimal models. 

It should be pointed out that the program Rew{V) consists of two distinct com- 
ponents: i) the program ESV{V) whose rules only depend on D, and ii) the program 
RV{V) whose rules depend on predicates defined in ESV{V) and D. Hence, the set 
of stable models of Rew{V)D can be computed in a level wise manner. 

As for the models of ESV{'P)d, observe that ESV{V)d has a unique minimal 
model, since it is a positive program. Let Mesv be such a model, and let M^sv = 
{a\A<^ Mesv}- 

Then, MM{ESV{r)D U RV{V)d) = MM{FMesv U i?F(P)n), where FMesv 
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denotes the set of facts associated with each atom in the model Mesv- FinaUy, in 
order to conclude the proof, it can be claimed that Mesv U M is a minimal model 
for MM{FMesv U RV{V)d) if, and only if, Af is a minimal model for Vd- 

In fact, consider a rule r : ai V • ■ • V a™ ^— 6, 61, • • ■, 6„ in T' and the corresponding 
rule r' : ai V ■ ■ ■ V a™ <— Ai, ■ ■ ■, Am, b, &i, ••, 6„ in RV{V), where 6 is a conjunction 
of derived predicates, while 61, ■ ■ •, 6„ are extensional predicates. It is easy to see, 
that for any model M for Vd, we have M C M^sv Hence, if b, &i, ■ ■ ■,bn is true in 
M then also B, bi, ■ ■ ■,bn is true in Mesv- Then, due to the rules Ai ^ B,bi, •■, 6„ 
in ESV{V), Ai, ■ ■ ■,Am is true in Mesv, too. 

Conversely, if b is false in M, then the body of r' is trivially false (no matter 
what the evaluation of B), too. Hence, after the assertion of the facts in Mesv, 
the semantics of RV{V)d is not affected from r' the predicates Ai,- ■ ■,Ara are 
removed. □ 

The previous proposition states that the program V and RewCP), obtained by 
restricting the rules of V, are equivalent. In fact, adding the atoms defined in 
ESV{V) inside the body of the rules of V, does not make any effective restriction, 
as for every ground atom a{t) appearing in the head of a disjunctive ground rule r 
there is a ground atom A{t) which is derived from ESV{V). 

Thus, in the following, instead of using the program ESV{V) to restrict the rules 
in V, a different program is considered which makes an effective and sound restric- 
tion; this program will be obtained by performing the binding propagation from 
the query goal into the rules of the program esv{'P). However, program ESV(V) 
is used instead of program esv{V), in order to distinguish between atoms of the 
source program and atoms making restrictions. 

Throughout the paper, the Magic-Set technique is considered which is a well- 
known and general technique. For special classes of queries we could use specialized 
techniques as the choice of the rewriting technique is independent and orthogonal 
w.r.t. the proposed framework. 

For details about the Magic-Set technique the reader should refer to 
| |Beeri &: Ramakrisnha n, 1991 Ull man, 1989a] ) while for specialized rewriting tech- 
niques see ( [Ullman, T989a||Ramakrisnhan et al, 1993||Greco et ai, 1995| ). 

Let's now formally provide a way of "collecting" all the adorned predicates gen- 
erated by the rewriting of queries. 

Definition 4 

Given a (positive) disjunctive Datalog program V, a predicate symbol p appearing 
in V and an adorned program V^ derived from V, then 

Coll{p,P,V^) = {p{Xi,--,Xk) ^ p"{Xi,--,Xk)\ior every p" in p" derived from p} 

denotes the set of rules (also called collecting rules) used to collect the atoms of 
the predicates having different adornments, but the same predicate symbols. 
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Moreover, 

Coll{V,V'^)= U Coll{p,V,V^) 

p appearing in V 

denotes the set of all collecting rules derived from V and 7"^. □ 

With respect to the rewriting of Definitional ESV{V) is now replaced by the 
rules of the Magic-Set program T" = Magic{G{t),ESV{V)) (where G is the sym- 
bol corresponding to g in ESV{V)) plus the rules used to collect the atoms with 
different adornments oiV, denoted by Coll{ESV{V),T"). 

Example 5 

Consider the program ANC of Example 01 and the query goal ancestor(john, Y). 
The corresponding extended standard query is i?5F((ancestor(john, Y), ANC}) = 
(ANCESTOR(john, Y), ESV{k^C)). The first step consists in the generation of adorn- 
ments for derived predicates. From the propagation of bindings only one binding 
for the predicate ANCESTOR''(john, Y) is derived. 

Thus, the program Ma5ic(ANCESTDR(john, Y),ESV(ANC)) is as follows: 

father" (X,Y) ^ magic. father" (X), related(X, Y)- 

brother" (X,Y) ^ magic_BRDTHER"(X), related(X, Y)- 

father" (X,Y) ^ magic. father" (X), BROTHER" (X, Y), related(X, Y)- 

BROTHER" (X,Y) ^ magic_BROTHER"(X), FATHER" (X, Y), related(X, Y)- 

ANCESTOR" (X, Y) ^ magic. ANCESTOR" (X), FATHER"(X, Y)- 

ancestor" (X, Y) ^ magic, ancestor" (X), FATHER" (X, Z), ANCESTOR" (Z, Y)- 

magic, ancestor" ( john)• 
magic_ANCESTOR" (Z) ^ magic. ANCESTOR" (X), FATHER" (X, Z) ■ 
magic. FATHER" (X) ^ magic. ANCESTOR" (X)- 
magic.BROTHER"(X) ^ magic.FATHER"(X)- 
magic. FATHER" (X) ^ magic. BROTHER" (X)- 

Here the predicate magic. ANCESTOR''* computes all ancestors which are relevant to 
establish whether a given person is an ancestor of j ohn. 

The set Con(£;S'l^(ANC), Ma5«c(ANCEST0R(john, Y), £^5'V"(ANC))), consisting of the 
rules collecting atoms with the same predicate and different adornments, is 

ANCESTOR(X, Y) ^ ANCESTOR" (X, Y) ■ 
FATHER(X, Y) ^ FATHER" (X, Y)- 

BROTHER(X,Y) ^ BROTHER" (X, Y)- 

These rules collect into ANCESTOR (resp. FATHER, BROTHER) all the ANCESTOR (resp. 
FATHER, BROTHER) atoms with different adornments. Since there is only one adorn- 
ment for each predicate, adornments and collecting rules could be eliminated. □ 

Definition 5 

Let Q ~ {g{t),V) be a disjunctive Datalog query, then the disjunctive Magic-Set 
rewriting of "P w.r.t. g{t), denoted by Disj - Magic{g(t) , V), is the following program: 
RV{V) U Coll{ESV{r), Magic{G{t), ESV{V))) U Magic{G{t), ESV(r))- □ 



16 G. Greco, S. Greco, I. Trubitsyna and E. Zumpano 

Example 6 

The complete rewriting of the program in Example [3 consists of 
the rules in CoZ/(£^5'F(ANC), Ma5ic(ANCEST0R(john, Y), ESV{mQ.))) U 
Ma5ic(ANCEST0R(john, Y), ESV{k^C)) (shown in Example EJ plus the rules 
in RV{mC): 

father (X, Y) V brother (X, Y) ^ FATHER(X, Y), BRDTHER(X, Y), related(X, Y)- 

ancestor(X, Y) ^ ANCESTOR(X, Y), f ather(X, Y)- 

ancestor(X, Y) ^ ANCESTOR(X, Y), f ather(X, Z), ancestor (Z, Y)- □ 

Prom the above observation and definition, combined with Proposition ^ the 
following result proved in ( |Greco, 2003D can be derived. 

Fact 1 

Let Q = {g{t),r) he a Datalog^ query, then, Q = {g{t), Disj.Magic{g{t), V)). □ 



4 Binding Propagation in Datalog^ Programs with Constraints 

This section formally introduces a technique for propagating bindings into Datalog^ 
queries with strong or classical constraints, a simple and powerful form of unstrat- 
ified negation. A strong constraint is a rule with empty head of the form: ^- B{X) 
where B{X) is a conjunction of literals and X is a vector of range restricted vari- 
ables, which must be satisfied in each model^. 

Contrary to standard Datalog, where bindings are propagated from the head of 
rules into the body, the problem with programs containing constraints is that the 
bindings need to be propagated also through the constraints. For instance, if one 
is interested in knowing whether p(l) is true in a program where there is the 
constraint ^- p{X), q{X), then the truth value of q{l) also needs to be evaluated. 

Thus, the truth value of each ground atom in a constraint depends on the truth 
value of the other atoms appearing in the same ground constraint, and, hence, in 
a more abstract perspective, constraints behave in a similar manner as disjunctive 
rules when propagating bindings into their heads. 

In the following, a Datalog^ program V with constraints (Datalog^'^) will be 
denoted by a pair {Vr,Vc), where Vr is a nonempty set of (positive) disjunctive 
rules and Vc is a set of constraints. It is worth noting that Vr being a nonempty 
set of positive disjunctive rules, the only form of negation contained in V is that 
related to the rewriting of constraints. Moreover, the use of constraints instead of 
general (possibly unstratified) negation is not a limitation since Datalog^'^ has the 
same expressive power as Datalog^'". Indeed, since in JEiter et aJ., 1997al ) it has 
been shown that Datalog^'^' has the same expressivity as Datalog^'^, whereas in 

^ It should be recalled that, under stable models semantics, a constraint <— B{X) could be 
rewritten into an equivalent rule with unstratified negation of the form: pi ^ B{X), -•pi, where 
Pi is a new predicate symbol not defined elsewhere in the program. 
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HZumpano, 2004| ) it has been shown that stratified negation can be simulated by 
using disjunction and constraints, we have that Datalog^'^ = Datalog^'^'. 

As has been shown in ( |Zumpano, 20041 ), that every rule containing stratified 
negation can be rewritten in Datalog^'^ rules. For instance, consider the following 
stratified rule where predicate b does not depend on predicate p: 

p{X) ^ aiX),^b{X) 

This rule can be rewritten as: 



p{X)^p'{X) 
p'{X)yb'{X) ^a{X) 
^p'{X),b'{X) 
^ b'{X)^b{X) 
^p'{Xlb{X) 



where the rule 2 together with the constraint 3 states that a is partitioned into p' 
and &', whereas constraints 4 and 5 state, respectively, that b' must be a subset of b 
and that the intersection between p' and b must be empty. Therefore, tuples being 
in ffl — 6 must be in p' . The rule 1 is necessary only in the case the predicate p in the 
source program is defined by more than one rule. In the following we assume that 
a predicate symbol p cannot appear both positively and negatively in two different 
constraints. This restriction does not make any limitation on the expressive power 
of the language as stratified negation can be emulated by considering the above 
restricted form of constraints. 

By following the same guidelines as in the previous section, the technique and 
the main results, by using a running example. 

Example 7 

Suppose to have the query (2col(l, 2), Coloring) checking whether a graph is 3- 
colorable and whether the colors red and Wue can be assigned to nodes 1 and 2 
respectively. The program Coloring consists of the following rules: 



2col(X,Y) ^ color(X, red), color(Y, blue)- 
color(X, red) V color(X, blue) V color(X, yellow) ^- node(X)- 
^ edge(X, Y), color(X, C), color(Y, C) • 



Definition 6 

Given a set of constraints Pc^ esv{Pc) denotes the set of Datalog rules obtained 

by replacing each constraint in Pc having the form 

where oi, • ■ ■, Uk are base atoms, fei, • • -,6™ are derived atoms and ci, • • •, c„ are 
negated literals (either base or derived), with the following set of rules: 

b, ^ &i,--, &j_i, 6j+i, ••, b,n, fli, • • ■, afc,-'Ci, • • ■,^Cn Vi e [1 • -m] 

Given program V — (VrjVc), the extended standard version of V, denoted by 
esv{V) = esv{{VR,Vc)) = esv{VR) U esv{Vc)- □ 
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Example 8 

The extended standard version of the program Coloring of Exampled obtained by 
the rewriting of constraints and disjunctive rules and denoted by esu(Coloring), is 
as follows: 



ri : 


2col(X,Y) ^ 


c;olor 


(X, red), color (Y, blue) ■ 


si : 


color(X, red) 


^ 


node(X)- 


S2 : 


color(X,blue) 


^ 


node(X)- 


S3 : 


color(X, yellow) 


^ 


node(X)- 


S4 : 


color(X, red) 


^ 


color (X, blue), node(X)- 


ss : 


color(X, blue) 


^ 


color(X, red), node(X)- 


se : 


color(X, red) 


^ 


color (X, yellow), node(X) 


S7 : 


color(X, yellow) 


^ 


color(X, red), node(X)- 


Ss : 


color(X,blue) 


^ 


color (X, yellow), node(X) 


Sg : 


color(X, yellow) 


^ 


color(X,blue), node(X)- 


ci : 


color(X,C) ^ 


edge 


(X,Y), color(Y,C)- 


C2 : 


color(Y,C) ^ 


edge 


(X,Y), color(X,C)- 



where the rule ri is derived from r, rules si — sg are derived from the rule s and 
rules ci and C2 are derived from constraint c. □ 

As in the case of disjunctive programs without constraints, ESV{V) denotes the 
program derived from esv{V) by replacing each derived predicate symbol g with a 
new predicate symbol G. 

Definition 7 

Let V = {Vr,Vc) be a Datalog^ program with constraints. Program Rew{V) is 
defined as RV{VR)\JVcy^ ESV{V). Given a query Q = {g{t),'P), Rew{Q) denotes 
the query (g(t),i?ew(7')). □ 

Notice that, with respect to programs without constraints, RV{Vr) is replaced 
by RV{Vr) U Vc and ESV{Vr) by ESV{Vr) U ESV{Vc)- 

Example 9 

The set of restricted rules in i?V"(Coloring), derived from the program Coloring 

of Example [3 is as follows: 

2col(X,Y) ^ 2C0L(X,Y), color(X, red), color (Y, blue)- 

color (X, red) V color (X, blue) V color (X, yellow) ^ CQLOR(X, red) , COLOR(X, blue) , 

CQLOR(X, yellow), node(X)- 

where predicates 2C0L and COLOR are defined in ^^^^^(Coloring) which is derived 
from program est;(Coloring), shown in Example |H1 by replacing 2col with 2C0L 
and color with COLOR. The complete rewriting of the program Coloring consists 
of the above rules plus the constraint: 



^ edge(X,Y), color(X,C), color(Y, C)- 
and the rules in ESV {Color Ing), presented in Example|Hl 



D 
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The first interesting result is that (as for Datalog^ programs without constraints) 
the above rewriting method does not affect the semantics of the query. 

Proposition 2 

Let V = {Vr,Vc) be a Datalog^ program with constraints, and Rew{V) be the 
rewritten version ofV. Then, for every atom g(t), {g{t),V) = {g{t), Rew{V)). 

Proof. RecaU that Rew{P) = Rew{{VR,Vc)) = RV{Vr) UVc^ ESV{V); hence 
Rew{P) = Rew{VB) U Pc U ESV{Vc) can also be written. 

Let us now consider program V' = RcwCPr) U ESV{Vc)] using the same argu- 
ments as for Proposition^] {g{t),VR) = {g{t),V') is derived. In fact, it has already 
been shown that adding the atoms defined in ESV{Vr) inside the body of the rules 
of 'P, does not make any effective restriction, provided that Vr, is a positive program; 
moreover, program ESV{Vc) is also a positive program that possibly enlarges the 
unique model Mesv of ESV{'P), without affecting the models of RV{Vr). Finally, 
the result follows by observing that the set of constraints Vc affects the result of 
the query only in the case when there is some ground constraint that is not satisfied, 
and by the fact that V and Rew{V) share the same set of constraints. D 

Thus, a viable way for reducing the number of models to be computed is to 
consider a suitable rewriting of ESV{V), which is able to make an effective and 
sound restriction by simulating the binding propagation occurring in top-down 
evaluation. This rewriting is carried out by means of the Magic-Set technique, 
which in fact limits attention to the models that are really needed for answering 
the query. 

Definition 8 

Let Q = {g{t),V) with V = {'Pr,'Pc), then the disjunctive Magic-Set rewriting of 

V w.r.t. g{t), denoted by Disj_Magic{g{t),V), is defined as follows: 

RV{Vr) UPc^ Coll{ESV{V), Magic{G{t),ESV{V))) U Magic{G{t), ESV{r))- □ 

Example 10 

Consider again the query (2col(l, 2), Coloring) of Example[7| 

The program Magic{2CQh{l, 2), ESV {Color ±n.^), obtained by applying the Magic- 
Set technique to the query i?5'V^((2col(l, 2), Coloring)), is as follows: 

magic_2C0L'"'(l,2)- 

magic. COLOR*" (X, red) ^ magic. 2C0L'"' (X, Y) • 
magic. COLOR*" (Y, blue) ^ magic. 2C0L*'' (X, Y) • 
magic. COLOR*" (X, blue) ^ magic. COLOR"" (X, red) ■ 
magic. color"" (X, red) ^ magic. COLOR"" (X, blue) ■ 
magic. color"" (X, yellow) ^ magic. COLOR"" (X, red) ■ 
magic. color"" (X, red) ^ magic. COLOR"" (X, yellow) • 
magic. color"" (X, blue) ^ magic. COLOR"" (X, yellow) • 
magic. COLOR"" (X, yellow) ^ magic. COLOR"" (X, blue) ■ 
magic. color"" (Y,C) ^ magic. CDL0R""(X, C), edge(X, Y)- 
magic. COLOR"" (X,C) ^ magic. CDL0R""(Y, C), edge(X, Y)- 
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color"" (X, red) ^ magic_CDLOR''"(X, red), node(X)- 
color"" (X, blue) ^ magic_CDLOR""(X,blue), node(X)- 
color"" (X, yellow) ^ magic. CDL0R""(X, yellow), node(X)- 

COLOR"" (X, red) ^ magic. COLOR"" (X, red), COLOR"" (X, blue)- 

color"" (X, blue) ^ magic. COLOR"" (X, blue), COLOR"" (X, red)- 

COLOR"" (X, red) ^ magic. COLOR"" (X, red), COLOR"" (X, yellow) ■ 

color"" (X, yellow) ^ magic. COLOR"" (X, yellow), COLOR"" (X, red) - 

color"" (X, blue) ^ magic. COLOR"" (X, blue), COLOR"" (X, yellow) - 

COLOR"" (X, yellow) -^ magic. COLOR"" (X, yellow), COLOR"" (X, blue) - 

C0L0R""(X,C) ^ magic.COLOR""(X, C),edge(X,Y), C0L0R""(Y, C)- 
C0L0R""(Y, C) ^ magic.COLOR""(Y,C),edge(X,Y), C0L0R""(X, C)- 

2C0L""(X, Y) ^ magic. 2C0L""(X,Y), color"" (X, red), color"" (Y, blue)- 

while the rules in Con(£'S'V'(Coloring), Magic(2CDL(l, 2), £;5'F(Coloring))) are: 

2C0L(X,Y) ^ 2C0L""(X,Y)- 

COLOR(X,red) ^ COLOR"" (X, red) - 

COLOR(X,blue) ^ COLOR"" (X, blue) - 

COLOR(X, yellow) ^ COLOR"" (X, yellow)- □ 

Before formally presenting the correctness of the rewriting technique, let us re- 
sume the process and make some comments. The rewriting process takes in input 
a query Q — {g{t),V) — {g{t), {Vr,Vc)) and first generates the equivalent query 
Q' = {g{t), {RV{Vr) U ESV{V),Vc)), where ESV{V) is a normal program. Next 
the query Q" — {g{t),{RV{VR,) U V'^Vc)), where V' is the optimized program 
derived from ESV{V) is produced. To answer the source query Q is sufficient to 
consider the minimal models of Vr, which satisfies Vc- As the program ESV{V) 
(resp. V') may contain negated literals, to answer the rewritten query Q' (resp. Q") 
the stable models of RV{Vr) U ESV^V) (resp. RV{Vr) U V) satisfying Vc have 
to be computed. Moreover, under the assumption that a predicate symbol p can- 
not appear both positively and negatively in two different constraints, the program 
ESV{V) is stratified and therefore has a unique stable model (namely the perfect 
or stratified model). Therefore, in order to answer the query Q' it is sufficient to 
compute the perfect model M of ESV{V) and then to compute the minimal models 
of RV{Vr) U M satisfying Vc- For unstratified ESV{V) the complete set of sta- 
ble models has to be considered. However, as already mentioned, problems in the 
second level of the polynomial hierarchy can be expressed by means of Datalog^ 
programs with restricted constraints. 



4-1 Query Equivalence Results 

For the sake of presentation in Figure ^ the main steps provided by the whole 
algorithm presented in this section are explicitly pointed out. The algorithm takes 
as input a query {g{t),V) and a database D, and outputs the set of stable models 
of Disj-Magic{g(t),V)D\ obviously, this set can be used for answering the query 
under both the possible and certain semantics. 
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Input: A query Q = {g{t),V) with V = {VrjVc), a database D; 
Output: The stable models of {Disj_Magic{g{t),'P))D- 
var: ESV, RV, Coll: set of rules; 
begin 

let ESV ■- 0, RV ■- 0, Coll ■- 0; 
C //*** generation of the restricted version *** 
[ for each rule r € Vr of the form ai(i) V . . . V a„(i) ^- B do 
(^ insert ai V- • -Van <— j4i, • • •, An, B into RV; 

//*** generation of the extended standard version *** 

for each rule r £ Vr of the form ai(i) V . . . V a„(i) ^ _B do begin 
insert a^ ^ B for 1 < i < ra into ESV; 
< insert ai ^- clj,B for 1 < i,j < n and i ^ j into ESV; 

end 

for each constraint c G Pc of the form ^ ai, ■ ■ ■, at, 6i, ■■, &,„, ^ci, ■ ■ ■, -^Cn do 
insert 6, ^- bi, ■■, 6i_i, 6i+i, ■■, bk, Qi, • ■ ■, Qfc, ^ci, • • •, ^c,i fori < i < m into BSF: 

//*** application of the Magic-Set technique to a normal program *** 

let MttjiJC := Magic{G{t),ESV{V)); 

//*** generation of the collecting rules *** 

for each predicate symbol p defined in V with arity k do 
I for each adornment a of p in Magic do 

[ insert p{X^_,-,Xk) ^ p°'{X^_,-,Xk) into Coll; 

return SM{RV U Pc U Coll U Majic U D); 
end. 



Fig. 1. Algorithm Extended Magic-Set (Magic-Partial) 

For Datalog^ programs without constraints the correctness of Algorithm in Fig.Q] 
follows from Fact^ For Datalog^ program with constraints, its correctness (under 
proper assumptions) will be provided below. 

First of all observe that as in the case of disjunctive queries without constraints, 
the application of the Magic-Set technique to queries with constraints produces a 
query that can be evaluated more efficiently. Unfortunately, for Datalog^'^ queries 
the technique previously described produces a query that, generally, is not equival- 
ent to the original one. This result is due to the fact that Magic-Set technique not 
only focuses on the models really relevant for answering the query, but also com- 
putes part of the models and not models in their entirety. In contrast, constraints 
express conditions that must hold for every ground instance of the program includ- 
ing atoms which are not relevant to the query. This observation will be clearer after 
the following example. 

Example 11 

Consider the query (2col(l, 2), Coloring) applied to the program of Example|Slon 
the graph shown in Figure |21 consisting of two disconnected components, say Ci 
and C2. Since both nodes 1 and 2 belong to component Ci, there is no way for 
propagating bindings from 1 and 2 into the component C2. This means that the 
query goal only depends on component Ci and the colorability of C2 affects the 
result only in the case when the component C2 is not colorable. □ 

The above example suggests that the original query Q = {g[t),T) and the query 
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Fig. 2. A graph with two components, with C2 being not 3-Colorable. 



Q , obtained by applying the Magic-Set technique to Q, are equivalent if the pro- 
gram Vd admits stable models. Before formally proving such an intuition, some 
preliminary definitions are provided. 

Given a Datalog^ program with constraints V and an interpretation A'^ for V, then 
V/N denotes the set of rules in groundiV) which are true w.r.t. N , and V / N denotes 
the set of rules in ground{V) which are false w.r.t. A^, i.e. V/N = ground {V) — V / N . 

In order to capture the meaning of the application of the Magic-Set technique in 
the case of disjunctive rules with constraints, use is made of the following concept. 

Definition 9 

Let 7^ be a Datalog^ program with constraints and D a database and let M be an 
interpretation for Vd , then M is a pre-model for Vd if there exists N G SM {Vd ) 
such that M C N. D 

In the case of Datalog^ programs without constraints, every model of the program 
rewritten by means of the Magic-Set technique can be extended to be a model of the 
original program, i.e. given a program Vd, every model of Disj _ Magic{g(t),VD) 
restricted to the predicates in Vd is a pre-model for Vd- 

Lemma 2 

Let {g{t),VR) be a disjunctive Datalog query without constraints. Then, for any 
database D, M G SA4{Disj-Magic{g{t), {Vr)d)) if, and only if, there exists A^ £ 
SM{Pd) such that M[{Vr)d] Q N. 

Proof. From Fact [H it is known that {g{t),VR) = {g{t),Disj.Magic{g{t),VR)). 
Indeed, for any model M G SM{Disj.Magic{g{t),(VR)D)) the program 
{Vr)d/M[{Vr)d] consists of a set of ground rules that are false in {Vr)d only 
because they are not necessary for answering the query, i.e. they are not used for 
propagating the binding. Thus, there exists a way for extending model M[{Vr)d] 
into a new model A^ for {Vr)d that satisfies the above rules. □ 

The above lemma states that from the rewritten program atoms not inferable 
from the source program cannot be inferred, apart from those introduced for binding 
propagation and to collect atoms with different adornments. 

In the case of a program V with constraint rules, the above observation does 
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not hold as it is not always the case that for a given program Vd every model of 
Disj _ Magic{g{t) ,Vd) restricted to the predicates in Vd is a pre-model for Pjj. 

Lemma 3 

There exists a query {g{t),V), with V — {Vr^Vc) s,nd Vc J^ 9, a database D and 

a model M in S Ai{Disj _ Magic{g{t) ^V d)) such that M[Pd] is not a pre-model for 

Vd. 

Proof. Consider the query (2col(l, 2), Coloring) presented in Example [7| Con- 
sider the database D modelling the graph in Example II II 

Then, Vd does not have stable models, since the graph is not 3-colorable. How- 
ever, SA4{Disj_Magic{2col{l, 2), Vd)) ^ 0, since the component of the graph con- 
taining both nodes 1 and 2 is 3-colorable — any such legal coloring corresponds in 
fact to a stable model. □ 

Hence, the Magic-Set technique (and any other similar technique to propagate 
bindings) for Datalog^ programs with constraints does not produce a query equiv- 
alent to the original one. Nonetheless, it is natural to investigate some restrictions 
that may guarantee the soundness and/or the completeness of the answers. 

Theorem 1 

Let Q — {g{t),V) be a query, where V — {Vr,Vc) is Datalog^ program with 
constraints, and D a database. Then, for each model M' of Vd , there exists M G 
SM{Disj.Magic{g{t),VD)), with M[Vd] ^ M' being a pre-model for V d ■ 

Proof. Recall that Disj.Magic{g{t),{VR,Vc)) is defined as RV{Vr) U 
Vc U Coll{ESV{V), Magic{G{t),ESV{V))) U Magic{G{t),ESV{V)). Given two 
programs ^i and 5*2 such that S2 — RV{Si), then we also denote 5i as RV~^{S2)- 

Let V = Dtsj_Magtc{g{t),V)) - Vc- Then, {g{t),V') = {g{t),VR) = 
{g{t), Disj-Magic{g{t),VR.)) as the additional rules in V' do not make any re- 
strictions on the ground rules in Vr used to derive the goal g{t). The program 
Vi = ground{V') consists of the three distinct sets: Vn = ground {RV {Vr)) 
containing restricted rules, 7^12 = ground{Coll{ESV{V), Magic{G{t),ESV{V)))) 
containing collecting rules and Vi^ — ground{Magic{G{t), ESV {V))) containing 
adorned rules. 

Consider now program V2 — ground{VR) — RV^'^{Vii) containing all the rules 
in groundiVR) not relevant for the query goal g{t). It is obvious that, letting Vz = 
ground {Vr) — V2, we have that {g{t),V3) = {g{t),Vi) as both contain all the ground 
rules relevant for the query goal. 

Therefore, for any database D, SM{Vd) = S M {ground {V)d) = SM{{V3 U 
Vc)d) X SM{{V2 U Vc)d), where for any two sets of stable models A and B, 

Ax B = {Ml U M2 I Ml e ^ A M2 e 5}. 

Moreover, since {g{t), V^UVc) = {g{t)^ Vi UPc), as the constraints act on atoms 
derived from both V3, and Vi, and {g{t),Vi U Vc) = {g{t)^V' U Vc), we conclude 
that {g{t),VzyJVc) = {g{t),Disj.Magic{g{t),VD)). 
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Hence, given a model M' oiVo, there exist (i) M e SM{Disj.Magic{g{t),VD)) 
and (ii) M" G SM{{V2 U T'c)/?) such that M' = M[Vd] x M" . D 

The above theorem can be restated in the following more explicative form. 

Corollary 1 

Let Q! = {g{t),V = {'Pr,'Pc)) be a query, _D be a database, and Q = 

{g{t),Disj.Magic{g{t),V)) then 

- Ansb{Q',D) C Ansb{Q,D), and 

Proof. The relation Ansh{Q!,D) C ^ns6(Q, _D) straightforwardly derives from 
Theorem ^ For the answer under cautious semantics, we distinguish two cases. 
If SM.{Vd) — 0, then any ground atom is trivially in AnsdO! ,D), and hence 
AnSc{Q',D) D AnSc{Q,D). Assume, then, that SMiVo) ^ 0- In this case, 
AnSc{Q' , D) — r\M'A{g,M'), where A{g,M') denotes the set of substitutions for 
the variables in g such that g is true in M' , for each M' stable model of Vd- 
Since, for each stable model M' , there exists M 6 SA4{Disj_Magic{g{t),VD), with 
M[Pd] Q M', it follows that Ansc{Q!,D) = C^M'A{g,M') = r\MA{g,M[VD])- As 
Disj_Magic{g{t),VD) might contain additional stable models w.r.t. those of "Pu, 
AnSc{Q, D) is a subset of nMA{g,M[VD])- Hence, we have that AnsdQ' , D) — 
nMA{g,M[rD]) 2 AnSciQ, D). D 

The above result has shed some light into the effectiveness of the Magic-Set tech- 
nique for disjunctive program with constraints. Indeed, the rewriting is both bravely 
complete, i.e., under the brave semantics it guarantees to compute all the answers 
for the original query, and cautiously sound, i.e., under the cautious semantics it 
guarantees that no false answers are in fact computed. Actually, since we cannot 
prove that soundness and completeness hold at the same time in any semantics, the 
algorithm presented in Figure Q] will be also called Magic-Partial algorithm. 

A natural extension, called Magic-Total algorithm, is shown in Figure 13 and 
consists of a first application of the MagicPartial, and in a successive evaluation 
of the stable models of the program in which the binding has not been propagated. 
It is worth noting that the Magic-Total algorithm returns the set of all the models; 
however, it is almost trivial to modify the algorithm, in order to implement the 
possible and certain semantics in a more direct and efficient way. 

We conclude by observing that applying the algorithm Magic-Total (rather than 
Magic-Partial) produces an overhead in query answering. Then, in the next section 
the results of some experiments are presented which quantify this overhead. 



5 Experimental results 

In this section some experimental results are presented to give an idea of the im- 
provements which can be obtained by means of this technique; the proposed al- 
gorithm has been included in the system presented in ( |Greco, 2003| ), and all the 
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Input: A disjunctive Datalog query Q = {g{t),'P), a database D; 
Output: The set of stable models of Vd in which g{t) is evaluated true; 
var: M, AAV: set of models; 
begin 

let M ■- 0; 

let MV —MagicPartial {{g{t),V), D); 

for each M G MV do 

let M:=Myj M[Vd] X SM{Vd/M[Vd\)\ 

return M; 
end. 



Fig. 3. Algorithm Magic-Total 

experiments have been carried out by means of the DLV system ( |Leone et aJ., 2002| ) 
on a PC with a Pentium 4, 1.7 GHz, 512 Mbyte of RAM under the operating system 
Linux. 

It should be pointed out that this proposal is neither a new evaluation strategy 
nor a new implementation, but an optimization useful for efficiently evaluating 
bound queries in bottom-up engines; in fact, this contribution lies in having formally 
proved that the Magic-Set rewriting can also be extended to deal with constraint 
rules, and, hence, implementation issues are subject for further research. 

Nonetheless, experimental evaluation makes sense, since some classical decision 
and optimization problems applied in "extreme" situations have been considered, 
representative of a wide spectrum of real cases, in which the improvements of our 
techniques are negligible and highly evident, respectively. 

Note that the timings considered in all the following experimental results do not 
include the time for the rewriting. In any case, this time does not affect the overall 
result as an exponential speedup in the execution times is obtained between the 
source and optimized (rewritten) versions. 

SIMPLE EXAMPLES. Consider disjunctive program Vi consisting of the following 
rule: 

piX)WqiX) ^ a{X,Y) 

and program V2 obtained by adding to Vi the constraint 

^piX),aiX,Y),q{Y),X<h 

Figure 21 (i) shows the results obtained by considering query (p(l),'Pi), evaluated 
over the database D consisting of a set of facts a(l, 2), a(2, 3), ■ ■ ■, a(k, k + 1). The 
figure presents the execution time for the source program and the optimized version, 
obtained by applying the rewriting technique for positive queries. The experiments 
have been performed with databases, whose number of facts is shown on the x- 
axis, while the y-axis shows the time taken to evaluate the query (in seconds) . The 
improvement of the optimized query is extremely high (observe that the scale of the 
y-axis is logarithmic), and is due to the fact that the optimized version propagates 
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the binding of the query p(l), with the effect of reducing the models to be computed 
from an exponential to a constant number. 
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Fig. 4. Results for the query of (i) Example 1, and (ii) Example |21 

In Figure01(ii) the results obtained by evaluating query (p(l),7'2) are presented. 
By comparing the results shown in Figure 01(i) and Figure^ (ii) it can be observed 
that also in this case there is an exponential speedup between the optimized query 
and the source query, as the number of ground rules in the optimized version is still 
constant. 

5.1 Search problems 

For the following queries graphs have been used having the structure depicted in 
Figure with base — height and output grade equal to 3 and 2, respectively. Here 
base denotes the number of nodes in the same layer, height the number of nodes 
in the same column and grade the number of arcs starting from each node not 
belonging to the top layer (or equivalently, the number of arcs ending at every 
node not belonging to the bottom layer). The number of nodes in the graph is 
base X height, and the number of arcs is (&ase— 1) x [height— 1) x grade + (base—l) + 
[height— 1). 



3-GOLORING. The query of Example|7|is considered with input graphs consisting 
of two disconnected components with variable sizes; the results are shown in Fig. 
|H1 where the computation of the source query and the computation of the query 
rewritten using both the Magic-Partial and Magic-Total algorithm have been con- 
sidered. In particular, in|H|(i) the two components are of very different size, and the 
nodes in the query goal belong to the larger one. The graph shows the execution 
times as the size of the larger component changes. In|H|(ii) the same experiments 
repeated using two components with the same number of nodes are presented. 

Note that in the first experiment, whose results are shown in Fig. El(i), there 
is no difference between the response time of the two optimized versions because 
the Magic-Set technique propagates the binding in the greatest component that 
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Fig. 5. Graph structures. 
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(ii) 



Fig. 6. Execution time for the 3-coloring problem. 



dominates the size of the graph. For both optimized programs the advantage with 
respect to the source program is evident, since the size of the component on which 
the binding does not propagate is quite irrelevant. In Fig. EKii) graphs with two 
components of the same size are being considered and, therefore, obtaining a full 
solution (i.e. Magic- Total) requires almost twice the time required for the partial 
solution (i.e. Magic-Partial). 



k-DOMINATING SET. Given a graph G = {V,E), a subset of the vertex set 
F' C V \s a. dominating set if for all m e V — V there is a w G V for which 
(m, v) e E. The k-dominating set problem consists in finding a partition of the 
nodes into Vi, ■■•, Vk disjoint dominating sets for G. The 3-Dominating Set problem, 
denoted by 3PDS, can be formalized by means of the following set of logic rules: 



v1(X) Vnvl(X) 
v2(X) Vnv2(X) 
v3(X) Vnv3(X) 



node(X)- 
node(X)- 
node(X)- 



v1(X),v2(X) 
v1(X),v3(X) 
v2(X),v3(X) 
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^- nvl(X),not connectedl(X)- 
<~ nv2(X),not connected2(X)- 
^- nv3(X),not connected3(X)- 

connectedl(Y) ^ vl(X), edge(X, Y)- 
connected2(Y) ^ v2(X), edge(X, Y)- 
connected3(Y) ^ v3(X), edge(X, Y)- 

Note that the first group of rules and the strong constraints induce a partition 
of the nodes into 3 disjoint sets, vl, v2 and ?;3. 

The query ((vl(l), v2(2), v3(3)), 3DS} is supphed on a graph G consisting of two 
components Ci and C2 . The size of Ci is fixed and it is assumed that this component 
contains nodes 1, 2 and 3. In some experiments the size of C2 was varied and the 
response time calculated for the source program described above, and the optimized 
program produced by the Algorithm in Fig. ^ 

The results, presented in Fig. [3 show that the optimized program is not affected 
by the size of C2 as there is no way of propagating the binding from nodes 1,2 and 
3. 



25 


20 

— 15 

1 

P 

S 10 

iD 

5 






►— 


Dpi 
=ro£ 


mize 
raiT 


d 
























— 1 


" 


3ou 


ce 


pro^ 


raiT 












/ 


/ 


























/ 


/ 


( 






















: 


] 
























r^ 




] 






















1 


D 1 


5 1 
Numt 


7 1 
er of n 


8 1 
odes 


9 2 


D 2 


5 



Fig. 7. Execution time for the 3-Dominating Set problem. 



5.2 Optimization problems 

In this section the possibility of using Magic-Set techniques for the optimization 
of queries over disjunctive Datalog programs is explored. In order to also express 
optimization problems the approach used in the DLV system is considered as well 
as the consideration, in addition to strong constraints, weak constraints. Weak 
constraints represent constraints which should be respected, but if they cannot be 
eventually enforced, then they only invalidate the portion of the program which 
they are concerned with ( |Greco, 1998| ). Therefore, strong constraints express a set 
of conditions that have to be satisfied, while weak constraints express a set of 
desiderable conditions that may be violated and their informal semantics is to 
minimize the number of violated instances. 
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A weak constraint is a rule of the form <^ 61, • • • , 6^, -ibk+i, • • ■ , ->bk+m- Given 
a program P U W where P is a set of rules and W a set of weak constraints, 
an interpretation M is a stable model for P U W^ if M is a stable model for P 
which minimizes the number of rules not satisfied in ground{W). Thus, the stable 
models of P can be ordered w.r.t. the number of weak constraints not satisfied in 
ground{ W); the preferred stable models are those which minimize this number. 

Like strong constraints, weak constraints are not used to infer atoms, but only to 
check that the computed set verifies a given property. In ( |Buccafurri et aJ., 2000| 
[Greco, 1998| l it is proved that the introduction of weak constraints allows the so- 
lution of optimization problems since each weak constraint can be regarded as an 
"objective function" of an optimization problem. 

Example 12 

Given a graph G = {V , E), defined by means of the unary predicate node and the 
binary predicate edge, we can model the MAX_ CLIQUE problem, asking for the clique 
of G having maximum size, by means of the following disjunctive Datalog program 
with both strong and weak constraints: 

c(X)Vnc(X) ^ node(X)- 

^ c(X),c(Y),X 7^ Y,not edge(X,Y)- 

<^ nc(X)- 

Note that the first rule is used for creating all possible partitions of nodes into c 
and nc, and the second one (i.e. the strong constraint) is used for ensuring that c is 
a clique, i.e. each couple of nodes in the clique must be connected by an edge, while 
the weak constraint minimizes the number of vertices that are not in the clique, or 
equivalently it maximizes the size of c. 

Consider the query (c(l), MAX_ CLIQUE) over the graph of FigureEl asking whether 
node 1 belongs to a clique of maximum size. It is easy to see that nodes 1, 9 and 
10 form a clique of size 3, which is also the size of the maximum clique in the 
component Ci. Component C2 is a clique of size 4, and, hence, the above query is 
fafee. In contrast, the query (c(12),MAX_ CLIQUE) asking whether node 12 belongs 
to a clique of maximum size is evaluated true. 

Now, observe that both the above queries are evaluated true, by using the Magic- 
Set rewriting; in fact, when we ask whether 1 belongs to a clique of maximum size, 
there is no way for propagating the binding in component C2 , where the maximum 
chque actually is. □ 

From the above example, it is clear that any query optimization technique applied 
in the presence of weak constraints, will eventually lead to a different semantics 
consisting in a 'local' optimization, rather than in a global one. 

In many circumstances, this semantics can also be desirable. In all other cir- 
cumstances, the global solution can be simply obtained by comparing the different 
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solutions obtained while making local optimizations, by exploiting the same ap- 
proach used in developing the Magic-Total algorithm. 

The extension of the technique to deal also with weak constraints is outside the 
scope of this paper. Therefore, in the next two paragraphs some hints for future 
research are presented. In particular, two optimization problems are considered 
where the propagation of bindings defines a partition of the input graph G into 
two separated components Gi and G2 and the optimal solution can be obtained 
by first computing an optimal solution using component Gi and next finding the 
global optimal solution starting from the partial solution obtained in the first step 
and considering component G2. 

However, in many cases an optimization problem Opt over a given graph G can 
be defined by decomposing the graph G into separated subgraphs Gi, ■ • ■, Gk and to 
compute the k subproblems. Thus, let Opt{Gi, Gi) be the optimization problem over 
the (sub-)graph Gi using the partial solution Oi, we say that Opt is decomposable 
if Opt{G,^) ^ Opt{Gk,Ok) where d = Opt(Gi,0) and O, = Opt{G^,0,-i) for 
j e [2- -k]. 

MAX GLIQ UE. Let us consider the program of Example ^1 and let us supply 
the query (c(l),MAX_ CLIQUE), by using a graph consisting of two components with 
the structure shown in Figure [3 The experimental results are shown in Fig. |S1 
In particular Fig. |Hl(i) shows the execution time for a graph consisting of two 
components having the same size. In Fig.|Hl(ii) the difference in the execution time 
between the source program and the optimized one for different sizes of the second 
component it is shown. Note that in Fig.|S|(ii), if the second component is empty 
(0 nodes) the source program performs a little better than the optimized one as the 
second program presents an overhead due to the instantiation and the computation 
of the magic rules. The advantage of using the optimized program becomes more 
evident as the size of the second component increases. 
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Fig. 8. The Max Chque Problem. 

It is worth noting that the previous example is a prototype of the guess and check 
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paradigm, that, as already pointed out, has been proved to be the most intuitive 
way for expressing NP (optimization) problems. 

MIN COLORING. Given a graph G = {V,E) a coloring for G is minimum if, 
in the assignment of colors to vertices, it uses the minimum number of colors. The 
disjunctive Datalog program modelling the MIN_ COLORING problem is the following: 



colfX, I) VncolfX, I) 



node(X), color(l) 



col(X, I), col(Y, I), edgeT(X, Y) 
col(X, l),col(X, J),l! = J- 
node(X),not colored(X)- 



colored(X) 
used(l) 

<^ used(l) 



col(X, I) 
col(X, I) 



The first rule guesses a coloring for the graph; the set of strong constraints checks 
the guess that two joined vertices do not have the same color, and that each vertex 
is assigned to exactly one color; the weak constraint requires that the number of 
colors used is minimum. 

The query (col(l, cl),MIN_COLORING) is supplied on different graph topologies. 
Some of the results are shown in Figure 
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Fig. 9. Execution time for the Min coloring problem. 



In particular the results obtained by augmenting the number of disconnected 
components, and by augmenting the cardinality of the components have been in- 
vestigated. Figure|ni(i) and FigureO(ii) show, respectively, the performance of the 
optimized program and of the source program with a different number of compo- 
nents. In all the experiments the first component contains node 1. 
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6 Conclusions 

In this paper a technique has been proposed for the optimization of bound queries 
over disjunctive deductive databases with constraints (a simple and expressive form 
of unstratified negation), which extends a previous technique suitable for disjunctive 
Datalog programs. As the usual way of expressing declaratively hard problems is 
based on the guess- and- check technique, where the guess part is expressed by means 
of disjunctive rules and the check part is expressed by means of constraints, the 
technique proposed here is highly relevant for the optimization of queries expressing 
hard problems. 

The proposed approach is based on the use of a binding propagation technique 
which, by reducing the size of the data relevant to answer the query, is suitable for 
minimizing the complexity of computing a single model and the whole number of 
models to be considered. 

The main contribution of the paper lies in the definition of a rewriting algorithm 
which systematically utilizes the query goal to propagate the binding through both 
the rules and the constraints thereby avoiding the computation of useless models. 
An interesting peculiarity of the formalization proposed here is that it is completely 
independent of the particular strategy adopted for propagating the binding: in this 
way, the results are completely orthogonal to the Magic-Set technique in itself, and, 
hence, to the results in ( |Greco, 2003| ). The value of the technique has been proved 
by several experiments. 

Acknowledgement. The authors thank Nicola Leone and Wolfgang Faber for useful 
suggestions. 
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